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The influence of climate on biodiversity is an important ecological question. Vari- 
^ ■ ous theories try to link climate change to allelic richness and therefore to predict 

Cd I the impact of global warming on genetic diversity. We model the relationship be- 

tween genetic diversity in the European beech forests and curves of temperature 
and precipitation reconstructed from pollen databases. Our model links the genetic 
measure to the climate curves through a linear functional regression. The interac- 
tion in climate variables is assumed to be bilinear. Since the data are georeferenced, 
our methodology accounts for the spatial dependence among the observations. The 
practical issues of these extensions are discussed. 
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1 1 Introduction 

2 Climate records sliow tliat tlie eartii lias recorded a succession of periods of 

3 major warming and cooling at different time windows and scales [5|,[l2|. During 

4 the last post-glacial period (18000 years before the present), Europe recorded 

5 a 15°C to 20°C warming depending on the area. At the same period there was 

6 an expansion of all forest biomes and an upward movement of the tree-lines 

7 that reached an altitude 300 m higher than today. Although there is a wealth 

8 of paleodata and detailed climate reconstruction for the Holocene period, we 

9 still lack some knowledge as to how the warming was recorded and what the 

10 vegetation feedbacks were that affected local or regional past climates. Various 

11 theories try to link climate change to allelic richness and therefore to predict 

12 the impact of global warming on genetic diversity. 

13 In the recent literature there have been a lot of theoretical results for regres- 

14 sion models with functional data. Based on this framework, we used a linear 

15 functional model to model the relationship between genetic diversity in Euro- 

16 pean beech forests (represented by a positive number) and curves of temper- 

17 ature and precipitation reconstructed from the past. The classical functional 

18 regression model has been extended in two ways to account for our specific 

19 problem. First, as the effects of temperature and precipitation are far from 

20 independent we included an interaction term in our model. This interaction 

21 term appears as a bilinear function of the two predictors. Second, since we 

22 have spatial data there is dependence among the observations. To take into 

23 account with dependence the covariance matrix of the residuals is estimated 

24 in a spatial framework and plugged into generalized least-squares to estimate 

25 the parameters of the model. The practical difficulties of these extensions will 

26 be discussed. 

27 In Section 2, we present the genetic and climate data. The functional regression 

28 model is studied in Section 3. Results are presented and discussed in Section 4 
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and concluding remarks are given in Section 5. 



2 2 Data 

3 Pollen records are important proxies for the reconstruction of climate param- 

4 eters since variations in the pollen assemblages mainly respond to climate 

5 changes. Based on the fossil and surface pollen data from pollen databases, 

6 we used modern analogue technique (MAT) to reconstruct climate variables. 

7 Climate reconstruction is accomplished by matching fossil biological assem- 

8 blages to recently deposited (modern) pollen assemblages for which climate 

9 properties are known. The relatedness of fossil and modern assemblages is usu- 

10 ally measured using a distance metric that rescales multidimensional species 

11 assemblages into a single measure of dissimilarity. The distance-metric method 



a 



12 is widely used among paleoecologists and paleoceanographers [8|. Temperature 

13 and precipitation were reconstructed at 216 locations from the present back 

14 to a variable date depending on available data. The pollen dataset was used 

15 to reconstruct climate variables, throughout Europe for the last 15 000 years 

16 of the Quaternary. Due to the methodology, each climate curve is sampled at 

17 irregular times for each location. 

18 Genetic diversities were measured from variation at 12 polymorphic isozyme 

19 loci in European beech {Fagus sylvatica L.) forests based on an extensive 

20 sample of 389 populations distributed throughout the species range. Based 

21 on these data, various indices of diversity can be computed. They mainly 

22 characterize within or between population diversity. In this article, we focus on 

23 the H index, the probability that two alleles sampled at random are different. 

24 This parameter is a good indication of gene diversity [3||. 

25 The two datasets were collected independently and their locations do not 

26 coincide. 



1 3 Functional Regression 

2 The functional linear regression model with functional or real response has 

3 been the focus of various investigations [ll, lo, lZ|, [Ufl . We want to estimate the 

4 link between the real random response yi = d{si), the diversity at site Si and 

5 {Oi{t),ni{t))t>o the temperature and precipitation functions at site Sj. There 

6 are two points to consider for the modeling: (i) functional linear models need 

7 to be extended to incorporate interaction between climate functions; (ii) since 

8 we have spatial data, observations cannot be considered as independent and 

9 we also need to extend functional modeling to account for spatial correlation. 

We assume that the temperature and precipitation functions are square in- 
tegrable random functions defined on some real compact set [0,T]. The very 
general model can be written as: 

Y = fmt),7T{t))T>t>o)+e 

10 / is an unknown functional from L^([0, T]) x L^([0, T]) to M and e is a spatial 

11 stationary random field with correlation function p(.). 

12 We assume here that the functional / may be written as the sum of linear 

13 terms in 6{t) and 7r(t) and a bilinear term modeling the interaction 

/(^,7r) = /i+ / A(t)e(t)dt+ f B(t)TT(t)dt+ ff C{t,u)6{t)n{u)dudt 

J[0,r] J[0,T] J J[0,T]2 

= ^l+{A■e) + {B■^x) + {Ce■^x) 

14 by the Riesz representation of linear and bilinear forms. 

15 A and B are in L2([0, T]) and C is a kernel of /.^([O, T]). 

Let {ek)k>o be an orthonormal basis of L^([0,r]). Expanding all functions on 
this basis we get 

+ 00 +00 
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A{t) = J2 (^kekit) B{t) = ^bkekit) C{t,u) = ^ CMekit)ee{u) 

fc=l k=l k,e=i 
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yi = f^+Yl «fc"fc + Yl bkPl + Y. CkialPe + ^i 

k=l k=l k,l=l 



1 If the sums are truncated at k = i = K the problem results in a linear 

2 regression Y = fi + Xcf) + e with spatially correlated residuals with 
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dim(X) = nx {2K + K^ 
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cov(£i,ej) = p{si 



3 In order to estimate the regression and the correlation function parameters we 

4 proceed by Quasi Generalized Least Squares: a preliminary estimation of (j) is 

5 given by Ordinary Least Squares, (f)* = (X*X)~^X*F, the correlation function 

6 is estimated from the residuals e = Y — Xcf)* and the final estimate of (/) is 

7 given by plugging the estimated correlation matrix E in the Generalized Least 

8 Squares formula cj) = {X^T,~^X)~^X^J2~^Y. If both estimations of (p and S are 

9 convergent and assuming normal distribution of the residuals then [9||: 

V^U-cp) ^Af(0, lim n(X*S"^X)~^) 

10 The estimation of S is convergent under mild conditions [J| and the conver- 

11 gence of 6 is assessed for example when the functions are expanded on a splines 

. n . n 

12 basis [Jj or on a Karhunen expansion |10l |. 

13 Significance of the predictors can be tested if the residuals are assumed to be 

14 Gaussian, within the classical framework of linear regression models. 



15 Several parameters need to be set. The first choice is that of the orthonormal 



1 basis. It can be Fourier, splines, orthogonal polynomials, wavelets. Then the 

2 order of truncation has to be determined. The spatial correlation function 

3 of the residuals may be of parametric form (exponential, Gaussian, spherical 

4 etc.). These choices will be made by minimizing a cross validation criterion: a 

5 sample with no missing data for all variables is determined, and for each site of 

6 the sample a prediction of the diversity is computed according to parameters 

7 estimated without the site in the sample. The global criterion is the quadratic 

8 mean of the prediction error. 



9 4 Results 

10 Pollen was collected throughout Europe providing temporal estimation of tem- 

11 peratures and precipitation. These estimations are not regularly spaced, and 

12 have very different ranges from 1 Kyears to 15 Kyears. Beech genetic indices 

13 are recorded in forests and do not coincide with the pollen locations. Figure 

14 [U shows the locations of the two datasets. 

Measurement sites 




Figure 1. Locations of pollen (black dots) and genetic (open circles) records. 



1 Climate variables are continuous all over Europe but beech forests have specific 

2 locations. In order to make our data to spatially coincide, temperature and 

3 precipitation curves are firstly estimated on a regular grid of time from 15 

4 Kyears to present on sites where are collected the genetic measures. 15 Kyears 

5 corresponds to the beginning of migration of plants onto areas made free by 

6 the retreating ice sheets. 

7 The interpolation is done by a spatio-temporal kriging assuming the covariance 

8 function is exponential and separable. Figure [2] shows for a particular site the 

9 estimated temperature curve together with some neighboring curves issued 
10 from collected pollen. 

Spatio-temporal kriging of temperature 




Figure 2. Resulting temperature curve (thick black curve) from spatio-temporal 
kriging of 20 neighboring temperatures curves from recorded pollen. 



11 We aim to predict genetic diversity with precipitation and temperature curves. 

12 This corresponds to a functional regression model with genetic diversity as de- 

13 pendent variable and temperature and precipitation curves as predictor vari- 

14 able. The cross validation criterion gives better results with an expansion of 

15 the predictor variables on a Fourier basis of order 5. Figures [3] and [4] show the 

16 coefficient functions A, B, and kernel C. 



Function A 



Function B 





Figure 3. Coefficient function A of the temperature and coefficient function B of the 
precipitation 



Kernel C 




Figure 4. Kernel C of the interaction temperature-precipitation 



1 The shape of the coefficient function A shows that the term {A, 6) will be 

2 higher when the gap between periods before 7.5 Kyears and after 7.5 Kyears 



1 is higher (temperatures before 7.5 Kyears are mostly negative), meanwhile the 

2 shape of the coefficient function B shows that the term (-B, vr) will be higher 

3 when the precipitation before 7.5 Kyears is higher (precipitation is positive). 

4 The surface of kernel C is obviously not the product of two curves in the two 

5 coordinates, showing an effect of interaction. 

6 In Figure[5]the residual variogram graph exhibits some spatial dependence. An 

7 exponential variogram is fitted, and the resulting covariance matrix is plugged 

8 into the GLS formula to update the coefficients and test the effects of the 

9 temperature, precipitation and interaction. 
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Figure 5. Empirical and fitted variogram on the residuals. 



10 The graphs in Figure [6] show that the model explains a part of the diversity 

11 variability. However it is far from explaining all the variability as the R^ is 

12 equal to 0.31. 
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Figure 6. Observed-Predicted response and Predicted-Residuals graphs. 



Table [U gives the analysis of variance of the four nested models: 



2 Model 1 

3 Model 2 

4 Model 3 

5 Model 4 



E(y) = f,+ {A;9) + {B- n) + {06; n) 
E{Y) = ij+{A;9) + {B;7t) 
E{Y) = iJ+{A;9) 
E{Y) = /i + {B; n) 



Table 1 

Analysis of variance models of nested models 



6 The p-values (2.2e-16) of the tests Hq: model 3 (model 4) against Hi: model 

7 2 and (1.430e-07) of the test Hq: model 2 against Hi: model 1 show that the 

8 interaction and the two variables have a strong effect. 
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Temperature 
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Figure 7. Pattern of temperature curves according to the range of the predicted 
response. 



Precipitation 
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Figure 8. Pattern of precipitation curves according to the range of the predicted 
response. 

1 To give a better understanding of the regression model we divide the predicted 
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1 response range into 4 classes: less than 0.24, ]0.24;0.26], ]0.26;0.28] and greater 

2 than 0.28. Figures [7] and [8] show the shapes of temperature and precipitation 

3 curves for each class. When low (< 0.24) diversity is predicted, temperature 

4 curves are globally higher than the averaged temperature curves on all the 

5 sample. As the predicted diversity becomes higher the gap between the two 

6 periods before 7.5 Kyears and after 7.5 Kyears gets more pronounced. This 

7 effect is less evident for precipitation, low diversity is predicted when the pre- 

8 cipitation is higher on the first period than the averaged precipitation curves 

9 on all the sample. When the predicted diversity is higher than 0.24 there seems 

10 to be no effect of precipitation on its the level. 

11 When the change of climate during the Holocene (12 Kyears to present) is 

12 significant the diversity is higher. This mostly concerns northern and west- 

13 ern Europe. This is coherent with previous studies [2|. After 12 Kyears and 

14 throughout the Holocene the climate was no longer uniform all over Europe. 

15 The largest mismatch between NW and SE Europe occurred around 9 Kyears 

16 and 5 Kyears. By 5 Kyears, all deciduous tree taxa (such as beech) were outside 

17 their glacial refugia. 



18 5 Conclusion 

19 The classical linear functional model has been extended in a straightforward 

20 manner to the case of two functional predictors with an interaction term, and 

21 with spatially correlated residuals. Such a model applied to complex paleoe- 

22 cological and biodiversity data emphasizes an interesting relationship between 

23 climate change and genetic diversity: diversity is higher when the change in 

24 climate (mostly temperature) during the Holocene (12 Kyears to present) 

25 was sizeable and lower when temperature and precipitation are both globally 

26 higher over the whole period. This model may be improved in several ways. 

27 The spatial effect may be handled in other ways, by means of a mixed struc- 

28 ture or with other kinds of correlation matrix structure. In this first attempt 

29 we have neglected the random structure and the correlation of the predic- 
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1 tors. Taking into account these two characteristics should give a better way 

2 to understand the real effect of climate on biodiversity. 
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